Efficient Declustering of Non-uniform Multidimensional Data Using Shifted Hilbert Curves

نویسندگان

  • Hak-Cheol Kim
  • Mario A. López
  • Scott T. Leutenegger
  • Ki-Joune Li
چکیده

Data declustering speeds up large data set retrieval by partitioning the data across multiple disks or sites and performing retrievals in parallel. Performance is determined by how the data is broken into ”buckets” and how the buckets are assigned to disks. While some work has been done for declustering uniformly distributed low dimensional data, little work has been done on declustering non-uniform high dimensional data. To decluster non-uniform data, a distribution sensitive bucketing algorithm is crucial for achieving good performance. In this paper we propose a simple and efficient data distribution sensitive bucketing algorithm. Our method employs a method based on shifted Hilbert curves to adapt to the underlying data distribution. Our proposed declustering algorithm gives good performance compared with previous work which have mostly focused on bucket-to-disk allocation scheme. Our experimental results show that the proposed declustering algorithm achieves a performance improvement up to 5 times relative to the two leading algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Declustering Using Fractals

We propose a method to achieve declustering for cartesian product les on M units. The focus is on range queries, as opposed to partial match queries that older declustering methods have examined. Our method uses a distance-preserving mapping, namely, the Hilbert curve, to impose a linear ordering on the multidimensional points (buckets); then, it traverses the buckets according to this ordering...

متن کامل

A Hierarchical Technique for Constructing Efficient Declustering Schemes for Range Queries

Multi-disk systems, coupled with declustering schemes, have been widely used in various applications to improve I/O performance by enabling parallel disk accesses. A declustering scheme determines how data blocks should be placed among multiple disks to maximize the parallelism. We focus on the problem of declustering grid-structured multidimensional data with the objective of reducing the resp...

متن کامل

Study of Scalable Declustering Algorithms for Parallel Grid Files

Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known acce...

متن کامل

A multidimensional discrete Hilbert-type inequality

In this paper, by using the way of weight coecients and technique of real analysis, a multidimensionaldiscrete Hilbert-type inequality with a best possible constant factor is given. The equivalentform, the operator expression with the norm are considered.

متن کامل

Efficient retrieval of multidimensional datasets through parallel I/O

Many scientific and engineering applications process large multidimensional datasets. An important access pattern for these applications is the retrieval of data corresponding to ranges of values in multiple dimensions. Performance is limited by disks largely due to high disk latencies. Tiling and distributing the data across multiple disks is an effective technique for improving performance th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004